Forecasting ensembles and combinations

Rob J Hyndman

15 November 2022

What is a forecast?

Where is Melbourne?

Where is Melbourne?

Where is Melbourne?

Australian monthly café turnover

Australian monthly café turnover

Australian monthly café turnover

Australian monthly café turnover

Australian monthly café turnover

Australian monthly café turnover

Mean (or point) forecast

Interval forecast

Quantile forecasts

Tidyverts packages

Combining point forecasts

library(tidyverse)
library(tsibble)
library(fable)
auscafe
# A tsibble: 168 x 2 [1M]
      Month Turnover
      <mth>    <dbl>
 1 2006 Jan    1914.
 2 2006 Feb    1750 
 3 2006 Mar    1984.
 4 2006 Apr    1967.
 5 2006 May    2005.
 6 2006 Jun    1944.
 7 2006 Jul    2019.
 8 2006 Aug    2043.
 9 2006 Sep    2039.
10 2006 Oct    2113.
# … with 158 more rows

Using the fable package

auscafe |>
  filter(year(Month) <= 2018) |>
  model(
    ets = ETS(Turnover),
    arima = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1)),
    snaive = SNAIVE(Turnover)
  )
# A mable: 1 x 3
           ets                     arima   snaive
       <model>                   <model>  <model>
1 <ETS(M,A,M)> <ARIMA(0,1,1)(0,1,1)[12]> <SNAIVE>

Using the fable package

auscafe |>
  filter(year(Month) <= 2018) |>
  model(
    ets = ETS(Turnover),
    arima = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1)),
    snaive = SNAIVE(Turnover)
  ) |>
  mutate(comb = (ets + arima + snaive)/3)
# A mable: 1 x 4
           ets                     arima   snaive          comb
       <model>                   <model>  <model>       <model>
1 <ETS(M,A,M)> <ARIMA(0,1,1)(0,1,1)[12]> <SNAIVE> <COMBINATION>

Using the fable package

auscafe |>
  filter(year(Month) <= 2018) |>
  model(
    ets = ETS(Turnover),
    arima = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1)),
    snaive = SNAIVE(Turnover)
  ) |>
  mutate(comb = (ets + arima + snaive)/3) |> 
  forecast(h = "1 year")
# A fable: 48 x 4 [1M]
# Key:     .model [4]
   .model    Month       Turnover .mean
   <chr>     <mth>         <dist> <dbl>
 1 ets    2019 Jan  N(3839, 4272) 3839.
 2 ets    2019 Feb  N(3514, 4587) 3514.
 3 ets    2019 Mar  N(3889, 6892) 3889.
 4 ets    2019 Apr  N(3809, 7868) 3809.
 5 ets    2019 May  N(3856, 9385) 3856.
 6 ets    2019 Jun N(3738, 10098) 3738.
 7 ets    2019 Jul N(3951, 12748) 3951.
 8 ets    2019 Aug N(4008, 14670) 4008.
 9 ets    2019 Sep N(3968, 15941) 3968.
10 ets    2019 Oct N(4100, 18726) 4100.
# … with 38 more rows

Combining point forecasts

auscafe |>
  filter(year(Month) <= 2018) |>
  model(
    ets = ETS(Turnover),
    arima = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1)),
    snaive = SNAIVE(Turnover)
  ) |>
  mutate(comb = (ets + arima + snaive)/3) |>
  forecast(h = "1 year") |>
  accuracy(
    data = auscafe,
    measures = list(mse=MSE)
  ) |>
  arrange(mse)
# A tibble: 4 × 3
  .model .type    mse
  <chr>  <chr>  <dbl>
1 comb   Test    956.
2 ets    Test   1688.
3 arima  Test   2647.
4 snaive Test  12525.

Forecasting competitions

Galton (1907)

According to the democratic principle of “one vote one value”, the middlemost estimate expresses the vox populi.

Newbold and Granger (1974)

“The combined forecasting methods seem to me to be non-starters . . . Is a combined method not in danger of falling between two stools?” Maurice Priestley

Makridakis competition (1982)

Spyros Makridakis

The first genuine forecasting competition

  • 1001 series from demography, industry, economics.
  • Highly controversial at the time.
  • Showed combinations out-performed individual methods

Makridakis competition (1982)

Spyros Makridakis

The first genuine forecasting competition

  • 1001 series from demography, industry, economics.
  • Highly controversial at the time.
  • Showed combinations out-performed individual methods

Conclusions

  1. Sophisticated methods do not necessarily provide more accurate forecasts than simpler ones.
  2. The relative ranking of the various methods varies according to the accuracy measure being used.
  3. The accuracy of combinations is usually better than individual methods.
  4. The accuracy of methods depends on the length of the forecasting horizon involved.

Makridakis competition (1982)

Spyros Makridakis

The first genuine forecasting competition

  • 1001 series from demography, industry, economics.
  • Highly controversial at the time.
  • Showed combinations out-performed individual methods

Consequences

Researchers started to:

  • take combination method seriously;
  • consider how to automate forecasting methods;
  • study what methods give the best forecasts;
  • be aware of the dangers of over-fitting;
  • treat forecasting as a different problem from time series analysis.

M4 competition (2020)

  • January – May 2018
  • 100,000 time series: yearly, quarterly, monthly, weekly, daily, hourly.
  • Point forecast and prediction intervals assessed.
  • Code must be public
  • 248 registrations, 50 submissions.

Winning methods

  1. Hybrid of Recurrent Neural Network and Exponential Smoothing models
  2. Forecast combination using xgboost to find weights

Combining point forecasts

Weighted averages

Suppose we have N different forecasting methods.

  • \hat{\bm{y}}_{T+h|T} = N-vector of h-step-ahead forecasts using information up to time T.

  • \bm{\Sigma}_{T+h|T}= N\times N covariance matrix of the h-step forecast errors.

  • \bm{1}= unit vector.

  • Simple combination: \tilde{y}_{T+h|T} = N^{-1}\bm{1}'\hat{\bm{y}}_{T+h|T}

  • Linear combination: \tilde{y}_{T+h|T} = \bm{w}'_{T+h|T}\hat{\bm{y}}_{T+h|T}

Optimal combination (minimizing MSE):

\bm{w}_{T+h|T} = \frac{\bm{\Sigma}_{T+h|T}^{-1}\bm{1}}{\bm{1}'\bm{\Sigma}_{T+h|T}^{-1}\bm{1}'} (Bates & Granger, 1969; Granger & Newbold, 1974)

Bates and Granger (1969)

Variations

Regression weights

Regress y_{t+h} against \hat{\bm{y}}_{t+h|t} for t=1,\dots,T.

  • Gives identical results to minimum MSE approach if weights constrained to sum to 1 and no intercept.
  • Forecasts are highly correlated, so principal component regression works better.

Inverse variance weights

  • More weight on methods with smaller forecast variance

AIC weights

  • More weight on methods with lower AIC values

Yet more variations

  • Bayesian weights
  • Nonlinear combinations
  • Sparse combinations
  • Meta-learning weights

Forecast combination puzzle

Simple average combinations almost always give more accurate forecasts than other combination methods.

  • Due to the error arising from estimation the weights
  • Sample sizes are rarely large enough to do better than simple averaging
  • Optimal weights change over time making estimation even more difficult
  • Successful combination strategies have used huge collections of time series

FFORMA

Feature-based FORecast Model Averaging

  1. Take 100,000 series from M4 competition.
  2. For each series produce forecasts from 8 different models.
  3. For each series compute a large number of features (e.g., length, strength of seasonality, strength of trend, spectral entropy, autocorrelations, Hurst exponent, unit root test statistics, etc.
  4. Train xgboost to learn weights of forecast combination using features as inputs.
  • The probability of each model being best is used to construct a model weight.
  • A combination forecast is produced using these weights.
  • Came second in the M4 competition

Combining probabilistic forecasts

Ensemble forecasting

Ensemble forecasting: mix the forecast distributions from multiple models.

Ensemble forecasting

auscafe |>
  filter(year(Month) <= 2018) |>
  model(ETS = ETS(Turnover),
        ARIMA = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1))
  ) |>
  forecast(h = "1 year")
# A fable: 24 x 4 [1M]
# Key:     .model [2]
   .model    Month       Turnover .mean
   <chr>     <mth>         <dist> <dbl>
 1 ETS    2019 Jan  N(3839, 4272) 3839.
 2 ETS    2019 Feb  N(3514, 4587) 3514.
 3 ETS    2019 Mar  N(3889, 6892) 3889.
 4 ETS    2019 Apr  N(3809, 7868) 3809.
 5 ETS    2019 May  N(3856, 9385) 3856.
 6 ETS    2019 Jun N(3738, 10098) 3738.
 7 ETS    2019 Jul N(3951, 12748) 3951.
 8 ETS    2019 Aug N(4008, 14670) 4008.
 9 ETS    2019 Sep N(3968, 15941) 3968.
10 ETS    2019 Oct N(4100, 18726) 4100.
# … with 14 more rows

Ensemble forecasting

auscafe |>
  filter(year(Month) <= 2018) |>
  model(ETS = ETS(Turnover),
        ARIMA = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1))
  ) |>
  forecast(h = "1 year") |>
  summarise(
    Turnover = dist_mixture(
                Turnover[1], Turnover[2],
                weights=c(0.5,0.5))
  ) |>
  mutate(.model = "ENSEMBLE")
# A fable: 12 x 3 [1M]
      Month     Turnover .model  
      <mth>       <dist> <chr>   
 1 2019 Jan mixture(n=2) ENSEMBLE
 2 2019 Feb mixture(n=2) ENSEMBLE
 3 2019 Mar mixture(n=2) ENSEMBLE
 4 2019 Apr mixture(n=2) ENSEMBLE
 5 2019 May mixture(n=2) ENSEMBLE
 6 2019 Jun mixture(n=2) ENSEMBLE
 7 2019 Jul mixture(n=2) ENSEMBLE
 8 2019 Aug mixture(n=2) ENSEMBLE
 9 2019 Sep mixture(n=2) ENSEMBLE
10 2019 Oct mixture(n=2) ENSEMBLE
11 2019 Nov mixture(n=2) ENSEMBLE
12 2019 Dec mixture(n=2) ENSEMBLE

Ensemble forecasting

auscafe |>
  filter(year(Month) <= 2018) |>
  model(ETS = ETS(Turnover),
        ARIMA = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1))
  ) |>
  forecast(h = "1 year") |>
  summarise(
    Turnover = dist_mixture(
                Turnover[1], Turnover[2],
                weights=c(0.5,0.5))
  ) |>
  mutate(.model = "ENSEMBLE") |>
  accuracy(
    data = auscafe,
    measures = list(crps=CRPS, rmse=RMSE)
  )
# A tibble: 1 × 4
  .model   .type  crps  rmse
  <chr>    <chr> <dbl> <dbl>
1 ENSEMBLE Test   31.7  45.1

Combination vs ensemble forecasting

Combination vs ensemble forecasting

Combination vs ensemble forecasting

Evaluating quantile forecasts

Evaluating quantile forecasts

\begin{align*} f_{p,t} &= \text{quantile forecast with prob. $p$ at time $t$.}\\ y_{t} &= \text{observation at time $t$} \end{align*}

Quantile score

Q_{p,t} = \begin{cases} 2(1 - p) \big|y_t - f_{p,t}\big|, & \text{if $y_{t} < f_{p,t}$}\\ 2p \big|y_{t} - f_{p,t}\big|, & \text{if $y_{t} \ge f_{p,t}$} \end{cases}

  • Low Q_{p,t} is good
  • Multiplier of 2 often omitted, but useful for interpretation
  • Q_{p,t} like absolute error (weighted to account for likely exceedance)
  • Average Q_{p,t} over p = CRPS (Continuous Rank Probability Score)
auscafe
# A tsibble: 168 x 2 [1M]
      Month Turnover
      <mth>    <dbl>
 1 2006 Jan    1914.
 2 2006 Feb    1750 
 3 2006 Mar    1984.
 4 2006 Apr    1967.
 5 2006 May    2005.
 6 2006 Jun    1944.
 7 2006 Jul    2019.
 8 2006 Aug    2043.
 9 2006 Sep    2039.
10 2006 Oct    2113.
# … with 158 more rows

Evaluating quantile forecasts

auscafe |>
  filter(year(Month) <= 2018)
# A tsibble: 156 x 2 [1M]
      Month Turnover
      <mth>    <dbl>
 1 2006 Jan    1914.
 2 2006 Feb    1750 
 3 2006 Mar    1984.
 4 2006 Apr    1967.
 5 2006 May    2005.
 6 2006 Jun    1944.
 7 2006 Jul    2019.
 8 2006 Aug    2043.
 9 2006 Sep    2039.
10 2006 Oct    2113.
# … with 146 more rows

Evaluating quantile forecasts

auscafe |>
  filter(year(Month) <= 2018) |>
  model(
    ETS = ETS(Turnover),
    ARIMA = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1))
  )
# A mable: 1 x 2
           ETS                     ARIMA
       <model>                   <model>
1 <ETS(M,A,M)> <ARIMA(0,1,1)(0,1,1)[12]>

Evaluating quantile forecasts

auscafe |>
  filter(year(Month) <= 2018) |>
  model(
    ETS = ETS(Turnover),
    ARIMA = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1))
  ) |>
  forecast(h = "1 year")
# A fable: 24 x 4 [1M]
# Key:     .model [2]
   .model    Month       Turnover .mean
   <chr>     <mth>         <dist> <dbl>
 1 ETS    2019 Jan  N(3839, 4272) 3839.
 2 ETS    2019 Feb  N(3514, 4587) 3514.
 3 ETS    2019 Mar  N(3889, 6892) 3889.
 4 ETS    2019 Apr  N(3809, 7868) 3809.
 5 ETS    2019 May  N(3856, 9385) 3856.
 6 ETS    2019 Jun N(3738, 10098) 3738.
 7 ETS    2019 Jul N(3951, 12748) 3951.
 8 ETS    2019 Aug N(4008, 14670) 4008.
 9 ETS    2019 Sep N(3968, 15941) 3968.
10 ETS    2019 Oct N(4100, 18726) 4100.
# … with 14 more rows

Evaluating quantile forecasts

auscafe |>
  filter(year(Month) <= 2018) |>
  model(
    ETS = ETS(Turnover),
    ARIMA = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1))
  ) |>
  forecast(h = "1 year") |>
  accuracy(data = auscafe,
    measures = list(crps=CRPS, rmse=RMSE)
  ) |>
  arrange(crps)
# A tibble: 2 × 4
  .model .type  crps  rmse
  <chr>  <chr> <dbl> <dbl>
1 ETS    Test   31.3  41.1
2 ARIMA  Test   32.9  51.5

Forecasting COVID-19 cases

Forecasting COVID-19 cases

Australian Health Protection Principal Committee

Data sources

  • Case-level data of all positive COVID-19 tests: onset and detection times.
  • Daily population mobility data from Google, Apple & Facebook
  • Weekly non-household contact surveys
  • Weekly behavioural surveys
  • Daily case numbers from many countries and regions via the Johns Hopkins COVID-19 repository

Case numbers

  • Recent case numbers are uncertain and incomplete as date of onset is not known until symptoms show and a test is obtained.

A model ensemble

Model 1: SEEIIR (Uni Melbourne/Doherty Institute)

  • Stochastic compartmental model with time-varying effective reproduction number.

Model 2: Generative model (Uni Adelaide)

  • Simulation with three types of infectious individuals: imported, asymptomatic, symptomatic

Model 3: Global AR model (Monash)

  • Single model fitted to all Johns Hopkins data from countries and regions with sufficient data.
  • Series with obvious anomalies removed.

Forecasting ensemble

  • Forecasts obtained from a equally-weighted mixture distribution of the component forecasts.
  • Also known as “linear pooling”
  • Works best when individual models are over-confident and use different data sources.

Ensemble forecasts: Victoria

Forecastability factors

  1. we have a good understanding of the factors that contribute to it, and can measure them.
  2. there is lots of data available;
  3. the future is somewhat similar to the past
  4. the forecasts cannot affect the thing we are trying to forecast.

Assessing forecast uncertainty

Evaluating probabilistic forecasts

Evaluating probabilistic forecasts

Evaluating probabilistic forecasts

Evaluating probabilistic forecasts

Evaluating probabilistic forecasts

Evaluating probabilistic forecasts

Evaluating probabilistic forecasts

CRPS: Continuous Ranked Probability Score

Forecasting many series

Forecasting many series

cafe
# A tsibble: 1,344 x 3 [1M]
# Key:       State [8]
      Month State Turnover
      <mth> <chr>    <dbl>
 1 2006 Jan ACT       21.7
 2 2006 Feb ACT       21.9
 3 2006 Mar ACT       24.9
 4 2006 Apr ACT       24.8
 5 2006 May ACT       27  
 6 2006 Jun ACT       24.5
 7 2006 Jul ACT       24  
 8 2006 Aug ACT       26.1
 9 2006 Sep ACT       26.2
10 2006 Oct ACT       33.7
# … with 1,334 more rows

Forecasting many series

Forecasting many series

cafe |>
  filter(year(Month) <= 2018)
# A tsibble: 1,248 x 3 [1M]
# Key:       State [8]
      Month State Turnover
      <mth> <chr>    <dbl>
 1 2006 Jan ACT       21.7
 2 2006 Feb ACT       21.9
 3 2006 Mar ACT       24.9
 4 2006 Apr ACT       24.8
 5 2006 May ACT       27  
 6 2006 Jun ACT       24.5
 7 2006 Jul ACT       24  
 8 2006 Aug ACT       26.1
 9 2006 Sep ACT       26.2
10 2006 Oct ACT       33.7
# … with 1,238 more rows

Forecasting many series

cafe |>
  filter(year(Month) <= 2018) |>
  model(
    ETS = ETS(Turnover),
    ARIMA = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1)),
    SNAIVE = SNAIVE(Turnover)
  ) |>
  mutate(
    COMB = (ETS+ARIMA)/2
  )
# A mable: 8 x 5
# Key:     State [8]
  State           ETS
  <chr>       <model>
1 ACT   <ETS(M,Ad,M)>
2 NSW   <ETS(M,Ad,M)>
3 NT     <ETS(A,N,A)>
4 QLD   <ETS(M,Ad,M)>
5 SA    <ETS(M,Ad,M)>
6 TAS    <ETS(A,N,A)>
7 VIC   <ETS(M,Ad,M)>
8 WA    <ETS(M,Ad,M)>
# … with 3 more variables: ARIMA <model>,
#   SNAIVE <model>, COMB <model>

Forecasting many series

cafe |>
  filter(year(Month) <= 2018) |>
  model(
    ETS = ETS(Turnover),
    ARIMA = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1)),
    SNAIVE = SNAIVE(Turnover)
  ) |>
  mutate(
    COMB = (ETS+ARIMA)/2
  ) |>
  forecast(h = "1 year")
# A fable: 384 x 5 [1M]
# Key:     State, .model [32]
   State .model    Month   Turnover .mean
   <chr> <chr>     <mth>     <dist> <dbl>
 1 ACT   ETS    2019 Jan N(61, 9.6)  60.7
 2 ACT   ETS    2019 Feb  N(64, 15)  63.8
 3 ACT   ETS    2019 Mar  N(72, 26)  71.9
 4 ACT   ETS    2019 Apr  N(67, 28)  67.0
 5 ACT   ETS    2019 May  N(70, 36)  69.8
 6 ACT   ETS    2019 Jun  N(67, 39)  67.3
 7 ACT   ETS    2019 Jul  N(68, 46)  68.4
 8 ACT   ETS    2019 Aug  N(70, 53)  69.7
 9 ACT   ETS    2019 Sep  N(69, 57)  68.5
10 ACT   ETS    2019 Oct  N(70, 66)  70.2
# … with 374 more rows

Forecasting many series

cafe |>
  filter(year(Month) <= 2018) |>
  model(
    ETS = ETS(Turnover),
    ARIMA = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1)),
    SNAIVE = SNAIVE(Turnover)
  ) |>
  mutate(
    COMB = (ETS+ARIMA)/2
  ) |>
  forecast(h = "1 year") |>
  accuracy(data = cafe,
    measures = list(crps=CRPS, rmse=RMSE)
  )
# A tibble: 32 × 5
   .model State .type  crps  rmse
   <chr>  <chr> <chr> <dbl> <dbl>
 1 ARIMA  ACT   Test   1.64  2.23
 2 ARIMA  NSW   Test  18.4  28.4 
 3 ARIMA  NT    Test   2.19  3.89
 4 ARIMA  QLD   Test  15.0  24.9 
 5 ARIMA  SA    Test   4.06  6.70
 6 ARIMA  TAS   Test   1.52  2.70
 7 ARIMA  VIC   Test  30.4  48.6 
 8 ARIMA  WA    Test   9.06 14.8 
 9 COMB   ACT   Test   2.02  3.31
10 COMB   NSW   Test  17.8  14.8 
# … with 22 more rows

Forecasting many series

cafe |>
  filter(year(Month) <= 2018) |>
  model(
    ETS = ETS(Turnover),
    ARIMA = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1)),
    SNAIVE = SNAIVE(Turnover)
  ) |>
  mutate(
    COMB = (ETS+ARIMA)/2
  ) |>
  forecast(h = "1 year") |>
  accuracy(data = cafe,
    measures = list(ss=skill_score(CRPS))
  )
# A tibble: 32 × 4
   .model State .type     ss
   <chr>  <chr> <chr>  <dbl>
 1 ARIMA  ACT   Test   0.465
 2 ARIMA  NSW   Test   0.331
 3 ARIMA  NT    Test  -0.359
 4 ARIMA  QLD   Test   0.402
 5 ARIMA  SA    Test   0.213
 6 ARIMA  TAS   Test   0.438
 7 ARIMA  VIC   Test  -0.813
 8 ARIMA  WA    Test   0.117
 9 COMB   ACT   Test   0.340
10 COMB   NSW   Test   0.355
# … with 22 more rows

Forecasting many series

cafe |>
  filter(year(Month) <= 2018) |>
  model(
    ETS = ETS(Turnover),
    ARIMA = ARIMA(Turnover ~ pdq(d=1) + PDQ(D=1)),
    SNAIVE = SNAIVE(Turnover)
  ) |>
  mutate(
    COMB = (ETS+ARIMA)/2
  ) |>
  forecast(h = "1 year") |>
  accuracy(data = cafe,
    measures = list(ss=skill_score(CRPS))
  ) |>
  group_by(.model) |>
  summarise(sspc = mean(ss) * 100)
# A tibble: 4 × 2
  .model  sspc
  <chr>  <dbl>
1 ARIMA   9.91
2 COMB   14.6 
3 ETS     6.23
4 SNAIVE  0   

Skill score is relative to seasonal naive forecasts

For more information

Slides: robjhyndman.com/seminars/compstat2022

Quantile forecasts